Generalized word shift graphs: a method for visualizing and explaining pairwise comparisons between texts
نویسندگان
چکیده
Abstract A common task in computational text analyses is to quantify how two corpora differ according a measurement like word frequency, sentiment, or information content. However, collapsing the texts’ rich stories into single number often conceptually perilous, and it difficult confidently interpret interesting unexpected textual patterns without looming concerns about data artifacts validity. To better capture fine-grained differences between texts, we introduce generalized shift graphs, visualizations which yield meaningful interpretable summary of individual words contribute variation texts for any measure that can be formulated as weighted average. We show this framework naturally encompasses many most commonly used approaches comparing including relative frequencies, dictionary scores, entropy-based measures Kullback–Leibler Jensen–Shannon divergences. Through diverse set case studies ranging from presidential speeches tweets posted urban green spaces, demonstrate graphs flexibly applied across domains diagnostic investigation, hypothesis generation, substantive interpretation. By providing detailed lens shifts corpora, help social scientists, digital humanists, other analysis practitioners fashion more robust scientific narratives.
منابع مشابه
cohesion and cohesive devices in a contrastive analysis between ge and esp texts
the present study was an attempt to conduct a contrastive analysis between general english (ge) and english for specific purposes (esp) texts in terms of cohesion and cohesive devices. to this end, thirty texts from different esp and ge textbooks were randomly selected. then they were analyzed manually to find the frequency of cohesive devices. cohesive devices include reference, substitution, ...
15 صفحه اولA heuristic rating estimation algorithm for the pairwise comparisons method
The pairwise comparisons method is a powerful tool used for establishing the relative order between different concepts in situations in which it is difficult (or sometimes even impossible) to provide explicit rating. Appropriate ratings are determined by solving eigenvalue problem for the pairwise comparisons matrix. This study presents a new iterative heuristic rating estimation algorithm that...
متن کاملBenchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs
Andrew Reagan, 2 Brian Tivnan, 3 Jake Ryland Williams, Christopher M. Danforth, 2 and Peter Sheridan Dodds 2 Department of Mathematics & Statistics, Computational Story Lab, & the Vermont Advanced Computing Core, University of Vermont, Burlington, VT, 05405 Vermont Complex Systems Center, University of Vermont, Burlington, VT, 05405 The MITRE Corporation, 7525 Colshire Drive, McLean, VA, 22102 ...
متن کاملSRCS: Statistical Ranking Color Scheme for Visualizing Parameterized Multiple Pairwise Comparisons with R
The problem of comparing a new solution method against existing ones to find statistically significant differences arises very often in sciences and engineering. When the problem instance being solved is defined by several parameters, assessing a number of methods with respect to many problem configurations simultaneously becomes a hard task. Some visualization technique is required for present...
متن کاملPairwise Comparisons Simplified
This study examines the notion of generators of a pairwise comparisons matrix. Such approach decreases the number of pairwise comparisons from n · (n − 1) to n − 1. An algorithm of reconstructing of the PC matrix from its set of generators is presented.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: EPJ Data Science
سال: 2021
ISSN: ['2193-1127']
DOI: https://doi.org/10.1140/epjds/s13688-021-00260-3